CRESST REPORT 742 



Joan L. Herman 
Kyo Yamashiro 
Sloane Lefkowitz 
Lee Ann Trusela 



EXPLORING DATA USE AND 
SCHOOL PERFORMANCE IN AN 
URBAN PUBLIC SCHOOL DISTRICT 



SEPTEMBER, 2008 




National Center for Research on Evaluation, Standards, and Student Testing 

Graduate School of Education & Information Studies 
UCLA University of California, Los Angeles 



Exploring Data Use and School Performance 
in an Urban Public School District 

Evaluation of Seattle Public Schools’ 
Comprehensive Value-Added Assessment System 



CRESST Report 742 



Joan L. Herman, Kyo Yamashiro, Sloane Lefkowitz, and Lee Ann Trusela 
CRESST/University of California, Los Angeles 



September 2008 



National Center for Research on Evaluation, 
Standards, and Student Testing (CRESST) 
Center for the Study of Evaluation (CSE) 
Graduate School of Education & Information Studies 
University of California, Los Angeles 
300 Charles E. Young Drive North 
GSE&IS Bldg., Box 951522 
Los Angeles, CA 90095-1522 
(310) 206-1532 



Copyright © 2008 The Regents of the University of California 

The work and research reported herein was commissioned and supported by the Stuart Foundation, Award 
Number 0008258-001. 



The findings and opinions expressed herein are those of the authors and do not necessarily reflect the positions 
or policies of the Stuart Foundation. 



EXPLORING DATA USE AND SCHOOL PERFORMANCE 



IN AN URBAN PUBLIC SCHOOL DISTRICT 1 

Evaluation of Seattle Public Schools’ 
Comprehensive Value-Added Assessment System 



Joan L. Herman, Kyo Yamashiro, Sloane Lefkowitz, and Lee Ann Trusela 
CRESST/University of California, Los Angeles 

Abstract 

This study examined the relationship between data use and achievement at 13 urban Title 
I schools. Using multiple methods, including test scores, district surveys, school 
transformation plans, and four case study site visits, the researchers found wide variation 
in the use of data to inform instruction and planning. In some cases, schools were 
overwhelmed with the amount of data or were not convinced that alternating test score 
data from two different tests provided dependable information. The researchers did not 
find a substantial link between data use and achievement, which may have been a result 
of the small sample size or different implementation methods between schools. Teachers 
and principals recommended important needs for more timely data delivery, individual 
versus group data reports, and better training in assessment and data analysis. 



INTRODUCTION 

Data use and evidence-based practices are at the heart of current school reform efforts. 
Fueled by the Federal No Child Left Behind Act (NCLB, 2002) and state and district 
policies, schools and the administrators and teachers within them are expected to use 
assessment data to identify student needs, to formulate school goals, to plan and implement 
educational strategies to achieve those goals, to monitor progress, and to continue to revise 
and refine their efforts to improve the academic performance of the school as a whole and 
each of the subgroups and individual students that comprise it. The logic is appealing: 
schools will use data to engage in a continuous improvement process, and indeed the cycle is 
one that has been long advocated for education (see for example, Tyler, 1949) and more 
recently in business (Deming, 1982), where the power and success of “learning 
organizations” that infuse data throughout their decision making processes has been widely 
recognized (Senge, 1990). 



1 Special thanks to Rachel Montgomery and Kristine Chong for editing and formatting support. 



1 



Yet despite the logical appeal, the evidence on the extent and effects of data use is 
weak. Available research, in fact, suggests that some of the prerequisites to effectively 
integrating data in schools’ and teachers’ decision-making processes may be problematic. For 
example, earlier studies have shown the limits in schools’ access and capacity to analyze 
available data in leadership and cultural support for change (Herman & Gribbons, 2001). 
Research on teachers’ use of assessment reveals similar issues in teachers’ use of classroom 
data to inform teaching and learning: Teachers have not been trained in assessment and many 
lack the sophisticated content and pedagogical knowledge needed to interpret student 
performance and pursue effective instructional alternatives (Herman, Osmundson, Ayala, 
Schneider, & Timms, 2005; Heritage & Yeagley, 2005). 

The study reported below sought additional information about the role of data in school 
improvement and the factors and strategies that support their effective use. We started with 
the supposition that if data use is critical to school improvement, then data use practices 
ought to differentiate effective from less effective schools and set out to test that hypotheses. 
In collaboration with a large urban district in the Pacific Northwest, we identified elementary 
schools serving low socioeconomic status (SES) students that were “beating the odds” in 
terms of fostering growth in their students’ academic performance and a comparison group of 
schools whose growth patterns were more typical for the district. Our earlier study had 
revealed that schools with higher concentrations of low SES students showed relatively less 
growth in student achievement relative to the district average, and that low performing 
students in these schools showed relatively less growth than their initially middle- and high- 
performing peers (Choi, Seltzer, Herman & Yamashiro, 2004). The current study used 
multiple methods to examine the data use practices within these Beat the Odds and typical 
schools in an attempt to explore the practices and factors that might contribute to school 
success. In the sections that follow, we describe our methodology and data sources, present 
results derived from them, and conclude with implications and next steps for research and 
practice in data use. 



METHODOLOGY 

Sampling was based on longitudinal student data available from the district. Study data 
sources and procedures included reviews of school transformation plans; observations of 
school presentations about their progress; special interviews and surveys conducted in the 
course of site visits; and available district survey data related to issues of school climate and 
culture. 



2 



Sampling 



Sampling was based on available student-level, longitudinal data for the Iowa Test of 
Basic Skills (ITBS) in reading and math for the years 1998 to 2003. Latent variable, 
multilevel analyses were used to estimate average gains for each school in the district as 
students progressed from 3rd grade to 5th grade, for two cohorts of students — those who 
were 3rd graders in 1998 and those who were 3rd graders in 2001. The analyses explored 
school growth trajectories for students at three different levels of initial achievement: 

• Average: students who started at the mean of their schools’ performance on the 
ITBS 

• Low: students who started at 1 5 points below the school mean; and 

• High: students who started at 15 points above the school mean. 

Based on these analyses, we identified Beat the Odds schools ( n = 7) who were below 
average in SES, who showed higher than average growth trajectories for the school as a 
whole and or for students who were low in initial achievement status and who were relatively 
consistent in performance in both reading and math and for the two cohorts. We identified as 
comparison schools six schools that were demographically similar to the Beat the Odds 
schools (e.g., in percentage of free lunch, ethnicity) and in terms of initial achievement 
status. 

All schools in the sample were Title I schools and were ethnically diverse, more so than 
the district as a whole. As seen in Table 1, the percentage of White students across these 13 
schools ranged from 3% to 59%. With most of the 13 schools (11), White students were a 
small minority, less than 25% of the school. Five schools had African American student 
populations of between 25% and 53%, whereas three schools (Pierce, Truman, and Polk) had 
African American student populations between 76% and 81%. Similarly, there were 
significant Asian student populations in seven of the schools, ranging between 25% and 57%. 
While Latino populations generally were similar to the district average, more than half the 
schools in our sample (7) had bilingual populations of more than double the district average 
(25% to 44%). 



3 



Table 1 

2004 Demographics: Student Ethnicity 



School 


Total 


Grade 

level 


American 

Indian 


Asian 


African 

American 


Latino 


White 


District 


46,416 


K-12 


2 


23 


22 


11 


41 


Van Buren 


299 


Pre K-5 


2 


17 


13 


9 


59 


Carter 


370 


K-5 


1 


54 


11 


25 


10 


Harding 


258 


K-5 


2 


40 


40 


15 


3 


Hoover 


533 


Pre K-5 


3 


21 


16 


16 


44 


Fillmore 


169 


K-5 


1 


18 


33 


21 


26 


Jefferson 


167 


K-5 


4 


38 


34 


19 


5 


Kennedy 


430 


K-5 


4 


34 


15 


29 


17 


Lincoln 


519 


K-5 


1 


59 


11 


8 


22 


Pierce 


232 


K-5 


1 


8 


76 


9 


6 


Polk 


134 


K-5 


1 


2 


81 


10 


5 


Truman 


208 


K-5 


2 


2 


80 


7 


9 


Tyler 


253 


K-5 


2 


26 


53 


12 


6 


Wilson 


295 


K-5 


2 


58 


25 


10 


5 



Note. With the exception of “Total,” and grade level, all other numbers are percentages. K = kindergarten. 



In terms of special populations served, these schools tended to be more diverse than 
average as well. Schools in this sample had wide variation in their bilingual populations 
(from 0% to 41%). The district average bilingual population was 12%. More than half the 
schools in our sample (7) had bilingual populations of more than double the district average 
(25% to 44%). In addition, approximately 9% of district students are special education 
students and three of the sample schools served populations just slightly larger than the 
district average (10% and 11%): Fillmore, Hoover, and Tyler. 

Transformation Plan Rating Process 

Because we thought school improvement planning offered a window into schools’ use 
of data in decision making, we conducted reviews of the transformation plans that every 
school in the district was required to submit. The review encompassed 3 years of plans for 
our sampled schools, those submitted in anticipation of the 2002-2003, 2003-2004, and 
2004-2005 school years. 



4 



Transformation Plan Rubric 



Based on a literature review of data use in schools and school performance indicators, a 
rubric was developed to address five primary components of data use: 

1 . Types of evidence or indicators used; 

2. Identification of goals/objectives through needs analysis; 

3. Identification of solution strategies; 

4. Analysis of progress; and 

5. Inclusion of stakeholders. 

Within each of these main components, the rubric provided for ratings on one to four 
specific quality dimensions: 

Component 1: Types of evidence or indicators used. The four dimensions captured by 
this component included: (a) breadth/range of evidence, (b) depth of analysis, (c) use of 
value-added data, and (d) technical sophistication. 

The breadth rating considered the number of different sources of information used by 
the school, ranging from reliance on state test results only to a set of evidence, such as parent 
survey data, classroom observations, classroom-based assessments, and portfolios. Depth 
addressed the detail at which schools analyzed each of their sources (e.g., ranging from just 
noting the level of particular scores to examining subject matter performance in relation to 
trends, in comparison to other subjects, and by subscales). Use of value-added data captured 
whether schools mentioned the value-added data in their planning and their perceived 
understanding of it. Finally, the technical sophistication rating addressed the appropriateness 
of the school’s data analysis strategies. 

Component 2: Identification of goals or objectives through needs analysis. This 
component was intended to measure the link between the school data and the types of goals 
and objectives set out in the transformation plan. The rating addressed the extent to which 
school goals were rooted in data on student needs and in logical remedies for those as 
opposed to goals and objectives seeming a “hodgepodge,” without much rhyme or reason. 

Component 3: Identification of solution strategies. The two dimensions that measured 
component three are: (a) specificity and (b) theory- or research-based/data-driven. 

The specificity dimension referred to the concreteness with which the school articulated 
its solution strategies. We looked for the extent to which general strategies were 
accompanied by action plans and specific benchmarks. The theory-based dimension referred 



5 



to whether the school identified strategies ad hoc, or identified strategies based on some 
theory of change, on a review of the literature, or on available evidence of effectiveness. 

Component 4: Analysis of progress. This addressed the degree to which schools 
planned for formative or periodic assessments by which they could monitor their process. In 
this component, we were looking for whether the school planned to periodically review data 
or other evidence of progress to make mid-course corrections, if need be. Essentially, we 
looked for evidence that the school planned to review data more than once a year and in 
addition to the annual state test results. 

Component 5: Inclusion of stakeholders. This component encompassed the degree to 
which various stakeholders were included throughout the transformation process. In other 
words, we looked for evidence that stakeholders (e.g., parents, Building Leadership Team 
[BLT] members, grade-level chairs, teacher aides, and community members) were part of the 
planning process. 

Improvement. The rubric thus addressed nine dimensions over the five components. 
Because each of these dimensions was rated for the three plans, we also could examine 
whether or not schools’ data use, as evidenced in the plans, was changing or improving over 
the period. To do so, we computed difference scores, comparing the first 2 years (because the 
2 were integrally related) to the third year. 

Transformation Plan Review Process 

Three researchers were trained to use the rubric to rate the transformation plans. An 
inter-rater reliability exercise was conducted on one randomly selected school to calibrate 
ratings. For this exercise, after initial training, raters independently reviewed the same 
school’s transformation plan. Once complete, the three raters met to discuss and resolve their 
individual ratings. Raters were asked to provide a rationale for the scores they chose, 
particularly when there were discrepancies between raters. This consensus-building process 
helped to assure that raters were each using consistent criteria. From this point, the plan 
sample was divided up between researchers so that two raters reviewed each of the remaining 
plans. After the ratings were complete, any discrepancies were resolved by subsequent 
discussion and consensus. 

District Survey Analysis 

Additional information was available for all 13 schools from the district’s annual 
survey of staff climate. The survey queried respondents about school leadership, instructional 
planning and teacher collaboration, feelings of trust and respect, communication, and 



6 



included a few questions related to data use. Items were grouped and aggregated according to 
these themes. Because we had only school-level averages from the web site reports, we were 
not able to empirically validate these subscales (e.g., through factor analysis or some other 
data reduction technique) that were so created. Two researchers individually grouped the 
items based on substantive content and then resolved any differences through consensus. 

Once items were grouped according to themes, we constructed averages across those 
items, based on the 5 -point Likert scales used by the district. Most of the items used one of 
two variants of a 5-point scale — reflecting either strength of agreement with a particular 
statement or opinion from 1 {strongly disagree) to 5 {strongly agree)', or behavior frequency 
1 {never), 2 {a few times a year), 3 {once or twice a month), 4 {once or twice a week), or 5 
{almost daily). One set of questions (items 4a-4d) used a 5 -point scale representing the 
degree to which teachers felt they had influence over particular decisions: 1 {no influence) to 
5 {a great deal of influence). Because of the similarity of the scale ranges and in their 
direction 1 {low, negative) to 5 {high, positive), and in the absence of other reliable 
alternatives, we combined items across these different response scales. Moreover, although 
we recognize the limits of constructing averages from scales that are not equal interval or that 
do not represent a continuous variable, it was the best analysis that could be conducted based 
on available data — that is, only school level averages. 

A second major limitation of the survey data was the comparability of the responses 
from year to year. Although the same questions were asked, the response rates each year 
could not be established, and it is very likely that the respondents varied from one year to the 
next. As a result, it was impossible to determine whether changes in responses over time 
were the result of changes in school climate or practice or simply were the result of changes 
in respondents. The inferences that can be drawn from these data thus are very limited, and 
we used them only to identify major differences among the sample schools. For the most 
part, the responses of the 13 schools in our sample tended to follow district averages. 

Case Study Site Visits 

We planned to conduct site visits to all 13 schools in our sample. However, because of 
changes in district leadership and its commitment to the study and because the timing of our 
visits overlapped with the district testing window, only four sites agreed to participate in this 
portion of the study. We visited each of these four schools for a 2-day period in May or June 
2004 during which specially developed interviews and surveys were conducted. Interviews 
were scheduled with the principal, the school’s BLT, and two grade-level groups of teachers 
divided by grade levels (kindergarten to 2nd grade and 3rd grade to 5th grade). Typically two 



7 



researchers conducted the interviews and tape-recorded the conversation for future 
transcription. All four interviews took place at three of the sites (Van Buren, Wilson, and 
Polk). Only three interviews took place at Truman because the school’s BLT is comprised 
only of teachers, all of whom were included in the teacher interviews. Each principal was 
interviewed individually; BLT interviews included 6-9 participants; and the group teacher 
interviews varied from 2-6 teachers at a time. In addition to the interviews, surveys were 
distributed to all teaching staff prior to the visit. To ensure anonymity, a return envelope was 
included, but most teachers opted to hand in the surveys to the researchers on site. Response 
rates at the four sites were: 39% at Van Buren; 36% at Wilson; 26% at Polk; and 56% at 
Truman (see Table 2). 

Incentives were provided for survey completion (a $5 gift certificate to Starbucks), to 
improve response rates. As another token of appreciation for participation, we also provided 
all of the staff who participated in an on-site interview with a $50.00 check. School sites 
received an honorarium of $500.00 for their participation in the study. 



Table 2 

Case Study Survey Response Rate By School 



School Site 


Certificated 

staff 


Classified 

staff 


Total 

staff 


No. surveys 
returned 


Response 

rate 


Van Buren (2) 


22 


9 


31 


12 


39% 


Wilson (17) 


19 


9 


28 


10 


36% 


Polk (23) 


14 


17 


31 


8 


26% 


Truman (40) 


21 


13 


34 


19 


56% 


Total 


76 


48 


124 


49 


40% 



All interviews were transcribed and qualitative analysis software was used to code the 
data. To analyze the interviews, researchers created codes and sub-codes using an inductive 
approach. A general review of the transcriptions informed the development of a code set that 
reflected salient concepts and common responses across interviews and respondents. The 
Atlas. ti statistical software package was used to code the interviews. Coding reliability was 
attained through researcher consensus. Two researchers individually coded each of the 
interview transcripts. After coding was complete, researchers shared their ratings, and final 
reliability was attained through consensus. 



Simple frequencies and cross-tabulations were computed for the teacher survey data. 
Survey data were aggregated at the school site as well as across the four sites, to determine 
overall trends. School site aggregates were compared to overall averages, to gauge whether 
any particular sites varied significantly from the overall average. Summaries of the data from 
each of the four case study sites are found in the appendix. 

In the Results section of this report, we indicate the data sources from which we were 
drawing, since we have more extensive case study data for only 4 of the 13 schools. The 
codes of the data sources referenced are shown in the Table 3. 



Table 3 

Data Source Codes 



Data source 


Code 


CRESST case study interviews 


CS interviews 


CRESST case study survey 


CS survey 


District staff survey 


District survey 


Transformation plans 


TP 


Transformation plan ratings 


TP ratings 



Achievement Analyses 

As mentioned above, the 13 schools in this study were initially selected based on their 
performance across two time points. Specifically, the sample contrasted low SES schools that 
had made larger than average gains in the performance of students from 2nd grade to 5th 
grade on the ITBS for the period just prior to the study with those that had shown more 
typical growth trajectories for the district. As the study was conducted, we sought to confirm 
the stability of our sample designation with additional data, and examined performance over 
five different cohorts for each school, from those who started in 3rd grade in 1998 to those 
who were in 3rd grade in 2001. These multiple cohorts also allowed us to examine a longer- 
term picture of student performance in each school: 

• 1998: Growth of students from 3rd grade in 1998 to 5th grade in 2000; 

• 1999: growth of students from 3rd grade in 1999 to 5th grade in 2001; 

• 2000: growth of students from 3rd grade in 2000 to 5th grade in 2002; 

• 2001 : growth of students from 3rd grade in 2001 to 5th grade in 2003; and 

• 2002: growth of students from 3rd grade in 2002 to 5th grade in 2004. 



9 



